Skip to content

Mshost stats#5588

Merged
nvazquez merged 130 commits intoapache:mainfrom
shapeblue:mshostStats
Apr 22, 2022
Merged

Mshost stats#5588
nvazquez merged 130 commits intoapache:mainfrom
shapeblue:mshostStats

Conversation

@DaanHoogland
Copy link
Copy Markdown
Contributor

@DaanHoogland DaanHoogland commented Oct 19, 2021

Description

This PR implements statistics for ManagementServers. The code queries the local mxBeans and the filesystem /proc for information on the system. if then publishes this to all ManagementSevers that keep a list of the recent state of each other. Static data is stored in the database, so if this ManagementServer is down others can still query it's last boot/start/stop/shutdown data (and possibly more)

the listManagementServerMetrics API gathers the data from the internal list and augments it with any static data from the DB.

Doc PR: apache/cloudstack-documentation#256

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

so far only manual verification in a lab environment has been done.

@DaanHoogland DaanHoogland marked this pull request as draft October 19, 2021 12:40
Comment thread framework/cluster/src/main/java/com/cloud/cluster/ManagementServerStatusVO.java Outdated
Comment thread server/pom.xml Outdated
Comment thread server/pom.xml Outdated
Comment thread server/src/main/java/com/cloud/server/StatsCollector.java Outdated
Comment thread server/src/main/java/com/cloud/server/StatsCollector.java Outdated
@apache apache deleted a comment from blueorangutan Oct 25, 2021
@apache apache deleted a comment from blueorangutan Oct 25, 2021
Comment thread engine/schema/src/main/resources/META-INF/db/schema-41520to41600.sql Outdated
Comment thread framework/cluster/src/main/java/com/cloud/cluster/ClusterManager.java Outdated
@apache apache deleted a comment from blueorangutan Oct 26, 2021
Comment thread framework/cluster/src/main/java/com/cloud/cluster/ManagementServerStatusVO.java Outdated
@DaanHoogland
Copy link
Copy Markdown
Contributor Author

@nvazquez @DaanHoogland the issues are following with simulator:

FAIL: test_list_management_server_metrics (integration.smoke.test_metrics_api.TestMetrics)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rohit/lab/apache/cloudstack/test/integration/smoke/test_metrics_api.py", line 307, in test_list_management_server_metrics
    self.assertTrue(isinstance(metrics.lastserverstop, str))
AssertionError: False is not true

and

FAIL: test_list_usage_server_metrics (integration.smoke.test_metrics_api.TestMetrics)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/rohit/lab/apache/cloudstack/test/integration/smoke/test_metrics_api.py", line 326, in test_list_usage_server_metrics
    self.assertTrue(isinstance(metrics.lastheartbeat, str))
AssertionError: False is not true

so these are the same as in travis. but not in a nested env :( that is not a hopeful prospect.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 20, 2022

In a simulator based env, mgmt server is never stopped, perhaps the value is false/null? In case of trillian env mgmt server is restarted.

@DaanHoogland
Copy link
Copy Markdown
Contributor Author

In a simulator based env, mgmt server is never stopped, perhaps the value is false/null? In case of trillian env mgmt server is restarted.

I think the other way around, but thanks for the clue. I think i fixed it now.

@DaanHoogland
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@acs-robot
Copy link
Copy Markdown

Found UI changes, kicking a new UI QA build
@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@acs-robot a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@yadvr
Copy link
Copy Markdown
Member

yadvr commented Apr 20, 2022

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@rohityadavcloud a Jenkins job has been kicked to build packages. It will be bundled with

SystemVM template(s). I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: http://qa.cloudstack.cloud:8080/client/pr/5588 (SL-JID-1415)

@acs-robot
Copy link
Copy Markdown

Found UI changes, kicking a new UI QA build
@blueorangutan ui

@blueorangutan
Copy link
Copy Markdown

@acs-robot a Jenkins job has been kicked to build UI QA env. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

UI build: ✔️
Live QA URL: http://qa.cloudstack.cloud:8080/client/pr/5588 (SL-JID-1416)

@acs-robot
Copy link
Copy Markdown

PR Coverage Report

CLASS INSTRUCTION MISSED INSTRUCTION COVERED BRANCH MISSED BRANCH COVERED LINE MISSED LINE COVERED
ConnectedAgentAttache 149 0 20 0 40 0
HostDaoImpl 4983 0 180 0 803 0
CloudStackContextLoaderListener 77 0 2 0 21 0
ListDbMetricsCmd 22 0 0 0 7 0
ListMgmtsMetricsCmd 35 0 0 0 9 0
ListUsageServerMetricsCmd 22 0 0 0 7 0
MetricsServiceImpl 2130 0 134 0 446 0
ClusterMetricsResponse 523 0 132 0 60 0
DbMetricsResponse 43 0 0 0 21 0
ManagementServerMetricsResponse 117 0 0 0 47 0
UsageServerMetricsResponse 23 0 0 0 11 0
ZoneMetricsResponse 501 0 126 0 56 0
ApiDBUtils 2374 0 210 0 592 0
ApiSessionListener 125 0 2 0 32 0
QueryManagerImpl 14177 0 1242 0 2395 0
ManagementServerJoinDaoImpl 3 0 0 0 1 0
ManagementServerJoinVO 57 0 0 0 19 0
ManagementServerHostStatsEntry 319 0 2 0 132 0
StatsCollector 2178 0 124 0 359 0

1 similar comment
@acs-robot
Copy link
Copy Markdown

PR Coverage Report

CLASS INSTRUCTION MISSED INSTRUCTION COVERED BRANCH MISSED BRANCH COVERED LINE MISSED LINE COVERED
ConnectedAgentAttache 149 0 20 0 40 0
HostDaoImpl 4983 0 180 0 803 0
CloudStackContextLoaderListener 77 0 2 0 21 0
ListDbMetricsCmd 22 0 0 0 7 0
ListMgmtsMetricsCmd 35 0 0 0 9 0
ListUsageServerMetricsCmd 22 0 0 0 7 0
MetricsServiceImpl 2130 0 134 0 446 0
ClusterMetricsResponse 523 0 132 0 60 0
DbMetricsResponse 43 0 0 0 21 0
ManagementServerMetricsResponse 117 0 0 0 47 0
UsageServerMetricsResponse 23 0 0 0 11 0
ZoneMetricsResponse 501 0 126 0 56 0
ApiDBUtils 2374 0 210 0 592 0
ApiSessionListener 125 0 2 0 32 0
QueryManagerImpl 14177 0 1242 0 2395 0
ManagementServerJoinDaoImpl 3 0 0 0 1 0
ManagementServerJoinVO 57 0 0 0 19 0
ManagementServerHostStatsEntry 319 0 2 0 132 0
StatsCollector 2178 0 124 0 359 0

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✖️ debian ✔️ suse15. SL-JID 3252

@DaanHoogland
Copy link
Copy Markdown
Contributor Author

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

Trillian test result (tid-3966)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 38820 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr5588-t3966-kvm-centos7.zip
Smoke tests completed. 93 look OK, 3 have errors
Only failed tests results shown below:

Test Result Time (s) Test File
test_01_add_primary_storage_disabled_host Error 0.74 test_primary_storage.py
test_01_primary_storage_nfs Error 0.17 test_primary_storage.py
ContextSuite context=TestStorageTags>:setup Error 0.31 test_primary_storage.py
test_03_deploy_and_scale_kubernetes_cluster Failure 32.26 test_kubernetes_clusters.py
test_07_deploy_kubernetes_ha_cluster Failure 66.06 test_kubernetes_clusters.py
test_08_upgrade_kubernetes_ha_cluster Failure 41.31 test_kubernetes_clusters.py
test_09_delete_kubernetes_ha_cluster Failure 38.25 test_kubernetes_clusters.py
ContextSuite context=TestKubernetesCluster>:teardown Error 127.84 test_kubernetes_clusters.py
test_hostha_enable_ha_when_host_in_maintenance Error 302.97 test_hostha_kvm.py

@github-actions
Copy link
Copy Markdown

This pull request has merge conflicts. Dear author, please fix the conflicts and sync your branch with the base branch.

@nvazquez
Copy link
Copy Markdown
Contributor

Hi @DaanHoogland can you please fix the conflict?

@DaanHoogland
Copy link
Copy Markdown
Contributor Author

fixed @nvazquez

@DaanHoogland
Copy link
Copy Markdown
Contributor Author

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result: ✔️ el7 ✔️ el8 ✔️ debian ✔️ suse15. SL-JID 3273

@nvazquez
Copy link
Copy Markdown
Contributor

Merging based on approvals, tests results and ignoring intermittent known failures

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

No open projects
Status: Done

Development

Successfully merging this pull request may close these issues.